Methods and apparatus to estimate the second frequency moment for computer-monitored media accesses

ABSTRACT

A disclosed example includes at least one memory, programmable circuitry, and instructions to cause the programmable circuitry to increment a first value corresponding to a first position in a vector based on a first bit value representation, the first bit value representation corresponding to a first audience member identifier; increment a second value corresponding to a second position in the vector based on a second bit value representation, the second bit value representation corresponding to a second audience member identifier; estimate the second frequency moment of the media impression data using the vector; determine the variance of the second frequency moment; and based on the second frequency moment, schedule a query of the media impression data to be processed by a computer at a future time.

RELATED APPLICATION

This patent arises from a continuation of U.S. patent application Ser.No. 16/917,260, filed Jun. 30, 2020, now U.S. Pat. No.______, entitled“METHODS AND APPARATUS TO ESTIMATE THE SECOND FREQUENCY MOMENT FORCOMPUTER-MONITORED MEDIA ACCESSES.” Priority to U.S. patent applicationSer. No. 16/917,260 is claimed. U.S. patent application Ser. No.16/917,260 is hereby incorporated herein by reference in its entirety.

FIELD OF THE DISCLOSURE

This disclosure relates generally to monitoring media content, and, moreparticularly, to methods and apparatus to estimate the second frequencymoment for computer-monitored media accesses.

BACKGROUND

Traditionally, monitoring entities determine audience exposure to mediabased on registered panel members. That is, a monitoring entity such asan audience measurement entity (AME) enrolls people who consent to beingmonitored into a panel. The AME then monitors those panel members todetermine media (e.g., television programs or radio programs, movies,digital versatile disks (DVDs), advertisements, webpages, streamingmedia, etc.) exposed to those panel members. In this manner, the AME candetermine exposure metrics for different media based on the collectedmedia measurement data.

As people are accessing more and more media through digital means (e.g.,via the Internet), it is possible for monitoring entities providing suchmedia to track all instances of exposure to media (e.g., on a censuswide level) rather than being limited to exposure metrics based onenrolled panel members.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an example block diagram of an environment to implement atechnique for logging impressions of accesses to server-based media.

FIG. 2 is an example block diagram of the audience metrics analyzer ofFIG. 1 .

FIG. 3 is a flowchart representative of example machine-readableinstructions which may be executed to implement the example audiencemetrics analyzer of FIGS. 1 and/or 2 .

FIG. 4 is a flowchart representative of example machine-readableinstructions which may be executed to implement the example audiencemetrics analyzer of FIGS. 1 and/or 2 to estimate the second frequencymoment.

FIG. 5 is a block diagram of an example processor platform structured toexecute the instructions of FIGS. 3 and/or 4 to implement the audiencemetrics analyzer of FIGS. 1 and/or 2 .

The figures are not to scale. In general, the same reference numberswill be used throughout the drawing(s) and accompanying writtendescription to refer to the same or like parts. Connection references(e.g., attached, coupled, connected, and joined) are to be construedbroadly and may include intermediate members between a collection ofelements and relative movement between elements unless otherwiseindicated. As such, connection references do not necessarily infer thattwo elements are directly connected and in fixed relation to each other.

Descriptors “first,” “second,” “third,” etc. are used herein whenidentifying multiple elements or components which may be referred toseparately. Unless otherwise specified or understood based on theircontext of use, such descriptors are not intended to impute any meaningof priority, physical order or arrangement in a list, or ordering intime but are merely used as labels for referring to multiple elements orcomponents separately for ease of understanding the disclosed examples.In some examples, the descriptor “first” may be used to refer to anelement in the detailed description, while the same element may bereferred to in a claim with a different descriptor such as “second” or“third.” In such instances, it should be understood that suchdescriptors are used merely for ease of referencing multiple elements orcomponents.

DETAILED DESCRIPTION

Providers of media (e.g., content, advertisements, advertisementcampaigns) often desire to determine an accurate count representative ofa number of users accessing their media (e.g., the number of times themedia is accessed). This can be accomplished by generating and/orotherwise identifying media data relating to the number of usersaccessing their media. Techniques for monitoring user access to anInternet-accessible media (e.g., techniques for generating and/oridentifying media data), such as digital television (DTV) media andInternet-based digital media, have evolved significantly over the years.Internet-accessible media is also known as digital media. In someexamples to generate and/or identify media data, such monitoring isperformed primarily through server logs. In particular, entities servingmedia on the Internet log the number of requests received for theirmedia at their servers. Records or logs to record such requests areexample types of media data. Server logs can be tampered with eitherdirectly or via zombie programs, which repeatedly request media from theserver to increase the server log counts of media. Also, media issometimes retrieved once, cached locally and then repeatedly accessedfrom the local cache without involving the server. Both of thesescenarios lead to inaccurate audience measurements.

Another technique to generate and/or identify media data include theinventions disclosed in Blumenau, U.S. Pat. No. 6,108,637, which ishereby incorporated herein by reference in its entirety, thatfundamentally changed the way Internet monitoring is performed. Forexample, Blumenau disclosed a technique wherein Internet media to betracked is tagged with monitoring instructions. In particular,monitoring instructions are associated with the hypertext markuplanguage (HTML) of the media to be tracked. When a client requests themedia, both the media and the monitoring instructions are downloaded tothe client. The monitoring instructions are, thus, executed whenever themedia is accessed, be it from a server or from a cache. Upon execution,the monitoring instructions cause the client to send or transmitmonitoring information from a content display site to a content providersite. The monitoring information is indicative of the manner in whichcontent was displayed.

In some implementations, an impression request can be used to send ortransmit monitoring information by a client device using a networkcommunication in the form of a hypertext transfer protocol (HTTP)request (or hypertext transfer protocol secure (HTTPS) request). In thismanner, the impression request reports the occurrence of a mediaimpression at the client device. For example, the impression requestincludes information to report access to a particular item of media(e.g., an advertisement, a webpage, an image, video, audio, etc.). Insome examples, the impression request or ping request can also include acookie previously set in the browser of the client device that may beused to identify a user that accessed the media. That is, impressionrequests cause monitoring data reflecting information about an access tothe media to be sent from the client that downloaded the media to amonitoring entity and can provide a cookie to identify the client deviceand/or a user of the client device. Sending the monitoring data from theclient to the monitoring entity is known as an impression request. Inexamples disclosed herein, the monitoring entity is the same entity thatprovides the media to the client. However, in other examples disclosedherein, the monitoring entity may be an audience measurement entity(AME) that did not provide the media to the client and who is a trusted(e.g., neutral) third party for providing accurate usage statistics(e.g., The Nielsen Company, LLC).

There are many monitoring entities operating on the Internet. Thesemonitoring entities provide services to large numbers of subscribers. Inexchange for the provision of services, the subscribers register withthe monitoring entities. Examples of such monitoring entities includesocial network sites (e.g., Facebook, Twitter, MySpace, etc.),multi-service sites (e.g., Yahoo!, Google, Axiom, Catalina, etc.),online retailer sites (e.g., Amazon.com, Buy.com, etc.), creditreporting sites (e.g., Experian), streaming media sites (e.g., YouTube,Hulu, etc.), etc. These monitoring entities set cookies and/or otherdevice/user identifiers on the client devices of their subscribers toenable the monitoring entity to recognize their subscribers when theyvisit their web site.

As used herein, an impression is defined to be an event in which a homeor individual accesses and/or is exposed to media (e.g., anadvertisement, content, a group of advertisements and/or a collection ofcontent). In Internet media delivery, a quantity of impressions orimpression count is the total number of times media (e.g., content, anadvertisement, or an advertisement campaign) has been accessed by a webpopulation (e.g., the number of times the media is accessed). In someexamples, an impression or media impression is logged by the monitoringentity in response to an impression request from a user/client devicethat requested the media. For example, an impression request is amessage or a communication (e.g., an HTTP request) sent by a clientdevice to an impression collection server of the monitoring entity toreport the occurrence of a media impression at the client device. Inresponse, the impression collection server logs an impression in animpression record. Logged impression records based on impressionrequests are another example type of media data.

In non-Internet media delivery, such as television (TV) media, atelevision or a device attached to the television (e.g., a set-top-boxor other media monitoring device) may monitor media presented by thetelevision. The monitoring generates a log of impressions associatedwith the media displayed on the television. The television and/orconnected device may transmit impression logs to the monitoring entityto log the media impressions. Such an impression log is another exampletype of media data.

A user of a computing device (e.g., a mobile device, a tablet, a laptop,etc.) and/or a television may be exposed to the same media via multipledevices (e.g., two or more of a mobile device, a tablet, a laptop, etc.)and/or via multiple media types (e.g., digital media available online,digital TV (DTV) media temporality available online after broadcast, TVmedia, etc.). For example, a user may start watching the Walking Deadtelevision program on a television as part of TV media, pause theprogram, and continue to watch the program on a tablet as part of DTVmedia. In such an example, the exposure to the program may be logged bythe monitoring entity twice, once for an impression log associated withthe television exposure, and once for the impression request generatedby the tablet. Multiple logged impressions associated with the sameprogram and/or same user are defined as duplicate impressions.

As another media monitoring example, the inventions disclosed inMazumdar et al., U.S. Pat. No. 8,370,489, which is incorporated byreference herein in its entirety, enable a monitoring entity to collectmore extensive media data (e.g., Internet usage data) by extending theimpression request process to encompass partnered database proprietorsand by using such partners as interim data collectors. The inventionsdisclosed in Mainak accomplish this task by structuring the monitoringentity to respond to impression requests from clients (who may not be amember of an audience measurement panel and, thus, may be unknown to themonitoring entity) by redirecting the clients from the monitoring entityto a database proprietor, such as a social network site partnered withthe monitoring entity, using an impression response. Such a redirectioninitiates a communication session between the client accessing thetagged media and the database proprietor. For example, the impressionresponse received from the monitoring entity may cause the client tosend a second impression request to the database proprietor. In responseto receiving this impression request, the database proprietor (e.g.,Facebook) can access any cookie it has set on the client to therebyidentify the client based on the internal records of the databaseproprietor. In the event the client corresponds to a subscriber of thedatabase proprietor, the database proprietor logs/records a databaseproprietor demographic impression in association with the client/user.Impressions logged in this manner generate another example type of mediadata.

In another media monitoring example, monitoring entities may generateand/or otherwise obtain sketch data. Sketch data provides summaryinformation about an underlying dataset without revealing personallyidentifiable information (PII) data for individuals that may be includedin the dataset. Such sketch data may include a cardinality defining thenumber of individuals represented by the data, but maintaining theidentity of such individuals private. The cardinality of sketch dataassociated with media exposure is a useful piece of information formonitoring entity because it provides an indication of the number ofaudience members exposed to particular media via a platform maintainedby the monitoring entity providing the sketch data. Sketch data may be afifth example type of media data.

As another media monitoring example, the inventions disclosed in Burbanket al., U.S. Pat. No. 8,930,701, which is incorporated by referenceherein in its entirety, enable a monitoring entity to track mediaimpressions for media presented by mobile device applications thatexecute on a mobile device. The inventions disclosed in Burbank enablethis by using an application campaign ratings (ACR) identifier to encodemultiple device and/or user identifiers found in a mobile device. Theone or more encrypted device/user identifier(s) can then be used toretrieve user information for a user of the mobile device by sending theone or more encrypted device/user identifier(s) to one or morecorresponding monitoring entities that store user information for itsregistered users. In the illustrated examples, the monitoring entity hasthe respective key(s) useable to decrypt device/user identifier(s)pertaining to its services (e.g., wireless carrier services, socialnetworking services, email services, mobile phone ecosystem app or mediaservices, etc.). In this manner, personally-identifying information forparticular services will be available to the monitoring entity thatprovides the particular service. The device/user identifier(s) may beused to generate another example type of media data.

Although examples disclosed herein are described in association withaudience metrics related to media impressions, examples disclosed hereinmay be similarly used for other applications to identify the secondfrequency moment of a set of data. The datasets themselves need not beaudiences. They could be, for example, a username, a full name, a streetaddress, a residence city and/or state, telephone numbers, emailaddresses, ages, dates of birth, social security numbers, demographicinformation, bank accounts, lists of purchased items, store visits,traffic patterns, and/or any other personal information provided bysubscribers in exchange for services from a monitoring entity. Thedatasets could be represented as lists of numbers or any otherinformation.

In examples disclosed herein, the media impression data (e.g., any ofserver logs, impression logs, sketch data, device/user identifier(s),etc.) may be stored in a database and used by the monitoring entitieswhen estimating, determining, and/or otherwise identifying the secondfrequency moment. As used herein, the second frequency moment may bereferred to as the repeat rate indicative of how frequent a data item ora set of data items appears in a data set. The second frequency momentis useful for determining the output size of self-joins (e.g.,identifying how many times person A and person B appear together) indatabases and/or the surprise index of the media impression data, or anysuitable data, in the database. As used herein, the surprise index isdefined using Equation 1 below.

$\begin{matrix}{{\lambda_{i} = \frac{\rho}{p_{i}}},{{{where}\rho} = {{E_{j}*p_{j}} = {{sum}\left( p_{i}^{2} \right)}}}} & {{Equation}1}\end{matrix}$

In Equation 1, λ_(i) is the surprise index, rho (ρ) is the repeat rate(e.g., Gini's index of homogenity), and p is a probability in a dataset. For example, if there are n possible mutually exclusive outcomeshaving probabilities p₁, p₂, . . . , p_(n), in the event the ithprobability occurs, the surprise index is defined by Equation 1. As anexample, a monitoring entity may observe a dataset of media impressiondata (e.g., any of server logs, impression logs, sketch data,device/user identifier(s), etc.) as {f, c, d, f, b, f, e, a, e, d} inwhich the elements A={a, b, c, d, e, f} each occur m_(i)={1, 1, 1, 2, 2,3} times. Accordingly, the second frequency moment in such an examplemay be defined by the Equation 2 below.

$\begin{matrix}{F_{2} = {{\sum_{i = 1}^{m}m_{i}^{2}} = 20}} & {{Equation}2}\end{matrix}$

The second frequency moment is useful to estimate costs (e.g., computeraccess time, computer processing power, computer resource usage, etc.)to service a query request of data or self-joins of the media impressiondata in a database. For example, a query request of a set of data mayrequest to identify the quantity of occurrences of either a dataelement, or set of data elements, a database. In this manner, the secondfrequency moment is useful to estimate the cost (e.g., computer time,computer processing power, computer resource usage, etc.) needed by acomputer (e.g., the computer 110) to complete the query request.Examples disclosed herein utilize a vector of counts methodology toestimate the second frequency moment of a dataset.

Examples disclosed herein employ a vector of counts generated using avector of length k in which each of the k positions (or bins) ispopulated based on collected media impression data (e.g., any of serverlogs, impressions, impression logs, sketch data, device/useridentifier(s), etc.). For example, a monitoring entity may observe onethousand (1,000) audience members exposed to a type of media. In theevent the vector length, k, is determined to be ten (10), each of theone thousand (1,000) elements is subsequently input through a hashingfunction and the corresponding output of the hashing function identifieswhich bins of the k bin positions to be incremented based on those inputelements. As used herein, each element in the monitoring data may beinput through a hashing function to convert the string representation ofthe media exposure element into a series of bits (e.g., base two bits,base 16 bits, etc.) representative of a k position or bin. In anumerical example, given the monitoring entity and the one thousand(1,000) exposures to media, an example vector of counts with k=10 binpositions may be generated based on the hashing function as [106, 96,111, 91, 98, 89, 96, 107, 101, 105].

In the vector of counts methodology, k is determined based on a desiredor target precision threshold. In examples disclosed herein, k may bedetermined based on user preference, computer processor limitations,production costs, and/or desired accuracy of result. For example, if kwere ten (10), the precision threshold may be a first level, while if Kwere twenty (20), the precision threshold may be a second level, higherthan the first level. In such an example, the first level and the secondlevel indicate levels of precision, where the second level is moreprecise than the first level.

Examples disclosed herein include methods and apparatus to estimate thesecond frequency moment of a database using the vector of countsmethodology. Further, examples disclosed herein employ methods andapparatus to utilize data from an audience metrics database associatedwith a single monitoring entity that may include duplicate elements. Asused herein, an audience metrics database that may include duplicateelements is configured to store media monitoring data in which the useof the audience metrics database does not require a deduplication methodto be performed. For example, the audience metrics database may be adatabase that includes duplicate data elements (e.g., two elementsindicating Person A visited website A in which such two elements areduplicates). In an alternate example, the audience metrics database maybe a database that in fact has been previously deduplicated. However, insuch an example, examples disclosed herein do not require the additionalsteps of verifying whether a deduplication method has been applied,whether a deduplication method should be applied, or whether apreviously applied deduplication method resulted in accurate values.

FIG. 1 shows an example environment 100 that includes an examplemonitoring entity 102 and example client devices 108. The examplemonitoring entity 102 includes an example monitoring entity computer 110that implements an example audience metrics generator 112 to determineaudience sizes based on logged media impressions and an example audiencemetrics analyzer 124 to estimate the second frequency moment. In theillustrated example of FIG. 1 , the monitoring entity computer 110 mayalso implement an impression monitor system to log media impressionsreported by the client devices 108. In the illustrated example of FIG. 1, the client devices 108 may be stationary or portable computers,handheld computing devices, smart phones, Internet appliances, smarttelevisions, and/or any other type of device that may be connected tothe Internet and capable of presenting media.

As used herein, an audience size is defined as a number of audiencemembers exposed to a media item of interest for audience metricsanalysis (e.g., determining the second frequency moment). In someexamples, an audience size is a unique audience size (withoutduplicates) that represents a count of unique individuals counted onlyonce regardless of the number of times each individual accessed themedia item. In other examples, an audience size represents a count ofindividuals, some of which are counted two or more times, that accessedthe media item. As used herein, a media impression is defined as anoccurrence of access and/or exposure to media 114 (e.g., anadvertisement, a movie, a movie trailer, a song, a web page banner,etc.). Examples disclosed herein may be used to estimate the secondfrequency moment of collected media impression data (e.g., any of serverlogs, impression logs, sketch data, device/user identifier(s), etc.) formedia impressions of any one or more media types (e.g., video, audio, aweb page, an image, text, etc.). In examples disclosed herein, the media114 may be content and/or advertisements. Examples disclosed herein arenot restricted for use with any particular type of media.

In the illustrated example of FIG. 1 , the monitoring entity 102distributes the media 114 via the Internet to users that access websitesand/or online television services (e.g., web-based TV, Internet protocolTV (IPTV), etc.). In examples disclosed here, the media 114 is served bymedia servers of the same internet domains as the monitoring entity 102.For example, the monitoring entity includes corresponding servers 118that can serve media 114 to their corresponding subscribers via theclient devices 108. Examples disclosed herein can be used to generatemedia impression data (e.g., based on any of server logs, impressionlogs, sketch data, device/user identifier(s), etc.) corresponding to themedia served by the monitoring entity 102. For example, the monitoringentity 102 may use such media impression data to promote their onlinemedia serving services (e.g., ad server services, media server services,etc.) to prospective clients. By showing media impression dataindicative of audience sizes drawn by the monitoring entity 102, themonitoring entity 102 can sell their media serving services to customersinterested in delivering online media to users.

In some examples, the media 114 is presented via the client devices 108.When the media 114 is accessed by the client devices 108, the clientdevices 108 send an impression request 122 to the server 118 to informthe servers 118 of the media accesses. In this manner, the server 118can log media impressions in impression records of an example audiencemetrics database 120. In the illustrated example of FIG. 1 , themonitoring entity 102 logs demographic impressions corresponding toaccesses by the client devices 108 to the media 114. Demographicimpressions are impressions logged in association with demographicinformation collected by the monitoring entity 102 from registeredsubscribers of their services.

In some examples, the media 114 is encoded to include a media identifier(ID). The media ID may be any identifier or information that can be usedto identify the corresponding media 114. In some examples the media IDis an alphanumeric string or value. In some examples, the media ID is acollection of information. For example, if the media 114 is an episode,the media ID may include program name, season number, and episodenumber. When the media 114 includes advertisements, such advertisementsmay be content and/or advertisements. The advertisements may beindividual, standalone ads and/or may be part of one or more adcampaigns. In some examples, the ads of the illustrated example areencoded with identification codes (i.e., data) that identify theassociated ad campaign (e.g., campaign ID, if any), a creative type ID(e.g., identifying a Flash-based ad, a banner ad, a rich type ad, etc.),a source ID (e.g., identifying the ad publisher), and/or a placement ID(e.g., identifying the physical placement of the ad on a screen). Insome examples, advertisements tagged with the monitoring instructionsare distributed with Internet-based media content such as, for example,web pages, streaming video, streaming audio, IPTV content, etc. As notedabove, methods, apparatus, systems, and/or articles of manufacturedisclosed herein are not limited to advertisement monitoring but can beadapted to any type of content monitoring (e.g., web pages, movies,television programs, etc.).

In some examples, media impression data is collected by the server 118based on beacon requests from tagged media. For example, the media 114of the illustrated example is tagged or encoded to include monitoring ortag instructions, which are computer executable monitoring instructions(e.g., Java, java script, or any other computer language or script) thatare executed by web browsers that access the media 114 via, for example,the Internet. Execution of the monitoring instructions causes the webbrowser to send the impression request 122 (e.g., also referred to astag requests) to one or more specified servers of the monitoring entity102. As used herein, a tag request 122 is used by the client devices 108to report occurrences of media impressions caused by the client devicesaccessing the media 114. In the illustrated example, the tag request 122include user-identifying information that the monitoring entity 102 canuse to identify the subscriber that accessed the media 114. For example,when a subscriber of the monitoring entity 102 logs into a server, themonitoring entity 102 sets a cookie on the client device 108 and mapsthat cookie to the subscriber's identity/account information at theserver 118. In examples disclosed herein, subscriber identity and/orsubscriber account information includes personally identifiableinformation (PII) such as username, full name, street address, residencecity and/or state, telephone numbers, email addresses, ages, dates ofbirth, social security numbers, demographic information, bank accounts,lists of purchased items, store visits, traffic patterns and/or anyother personal information provided by subscribers in exchange forservices from the monitoring entity 102. By having such PII data mappedto cookies, the monitoring entity 102 can subsequently identify thesubscriber based on the cookie to determine when that user accesseddifferent media 114 and to log an impression in association withdemographics and/or other PII data of that user. In the illustratedexample of FIG. 1 , the impression request 122 includes cookies of theclient devices 108 to inform the monitoring entity 102 of the particularsubscribers that accessed the media 114.

The tag request 122 may be implemented using HTTP requests. However,whereas HTTP requests are network communications that traditionallyidentify web pages or other resources to be downloaded, the tag request122 of the illustrated example are network communications that includeaudience measurement information (e.g., ad campaign identification,content identifier, and/or user identification information) as theirpayloads. The server (e.g., the monitoring entity computer 110 and/orthe server 118) to which the tag request 122 is directed is programmedto log occurrences of impressions reported by the tag request 122.Further examples of monitoring instructions (e.g., beacon instructions)and uses thereof to collect impression data are disclosed in Mazumdar etal., U.S. Pat. No. 8,370,489, entitled “Methods and Apparatus toDetermine Impressions using Distributed Demographic Information,” whichis hereby incorporated herein by reference in its entirety.

In other examples in which the media 114 is accessed by apps on mobiledevices, tablets, computers, etc. (e.g., that do not employ cookiesand/or do not execute instructions in a web browser environment), mediaimpression data may be collected by the server 118 based on networkcommunications from data collectors installed in such devices. Forexample, an app publisher (e.g., an app store) can provide a datacollector in an install package of an app for installation at the clientdevices 108. When a client device 108 downloads the app and consents tothe accompanying data collector being installed at the client device 108for purposes of audience/media/data analytics, the data collector candetect when the media 114 is accessed at the client device 108 and causethe client device 108 to send the impression requests 122 to report theaccess to the media 114. In such examples, the data collector can obtainuser identifiers and/or device identifiers stored in the client devices108 and send them in the impression request 122 to enable the monitoringentity 102 to log impressions. Further examples of using a collector inclient devices to collect impression data are disclosed in Burbank etal., U.S. Pat. No. 8,930,701, entitled “Methods and Apparatus to CollectDistributed User Information for Media Impressions and Search Terms,”and in Bosworth et al., U.S. Pat. No. 9,237,138, entitled “Methods andApparatus to Collect Distributed User Information for Media Impressionsand Search Terms,” both of which are hereby incorporated herein byreference in their entireties.

In yet other examples, any other technique for collecting mediaimpression data may be used. For example, server logs may be used to logmedia impressions in response to HTTP requests for media received fromclient devices. The impressions can be logged by the server inassociation with user/subscriber demographics based on user/deviceidentifying information in the requests for media.

Examples disclosed herein identify the second frequency moment based onmedia impression data 132 generated by the monitoring entity 102. Themedia impression data 132 may include any of server logs, impressions,impression logs, device/user identifier(s), etc. As used herein, sketchdata is an arrangement of data for use in massive data analyses. Forexample, operations and/or queries that are specified with respect tothe explicit and/or very large subsets, can be processed instead insketch space (e.g., quickly (but approximately) from the much smallersketches representing the actual data). This enables processing eachobserved item of data (e.g., each logged media impression and/oraudience member) quickly in order to create a summary of the currentstate of the actual data. In some examples, the sketch data correspondsto a vector of values generated by processing data entries in thedatabase through one or more hash functions. More particularly, in someexamples, the PII associated with particular audience members is used asinputs for the hash function(s) to generate outputs corresponding to thevalues of the vector for the sketch data. In examples disclosed herein,such inputs obtained from the PII are referred to herein as audiencemember identifiers. For example, audience member identifiers may be anyidentifier suitable for identifying a device and/or a user of the device(e.g., device/user identifier). Examples of audience member identifiers(e.g., device/user identifiers) may include a username, a full name,street address, a residence city and/or state, telephone numbers, emailaddresses, ages, dates of birth, social security numbers, demographicinformation, bank accounts, lists of purchased items, store visits,traffic patterns and/or any other personal information provided bysubscribers in exchange for services from a monitoring entity. Inasmuchas hashing functions cannot be reversed, the PII data for the particularaudience members is kept private, thereby preserving the anonymity ofthe underlying raw data represented by the sketch data.

The example audience metrics analyzer 124 is configured to, responsiveto the monitoring entity 102 determining the media impression data 132,identify the second frequency moment of the stored media impression data132. The second frequency moment of the media impression data 132 storedin the audience metrics database 120 provides insight regarding therepeat rate of certain elements in the audience metrics database 120.Such a second frequency moment can be used to estimate query times ofthe media impression data stored in the audience metrics database 120.For example, the second frequency moment provides insight as to theaccess costs (e.g., time, processing power, resource usage, etc.) ofcertain sets of data in the audience metrics database 120. Further insuch an example, the audience metrics analyzer 124 may be used by themonitoring entity 102 in response to a query request of data within theaudience metrics database 120. For example, one may submit a request tothe monitoring entity 102 to identify the quantity in which a specificelement, or set of elements, occurs within the audience metrics database120. In applications in which there are millions of entries, such aquery request may require significate time, processing power, energycosts, etc. Accordingly, the monitoring entity 102 can efficientlyestimate the second frequency moment using the audience metrics analyzer124 to better estimate the time, processing power, energy cost, etc.,associated with the query request. With this, the monitoring entity 102can more efficiently allocate resources prior, during, and in futurescheduling to ensure the query request can be fulfilled. Exampleoperation of the audience metrics analyzer 124 is described below.

FIG. 2 is an example block diagram of the audience metrics analyzer 124of FIG. 1 . In FIG. 2 , the audience metrics analyzer 124 includes anexample data interface 200, an example comparator 202, an examplehashing generator 204, an example vector generator 206, an examplesecond frequency moment generator 208, and an example variance generator210.

In FIG. 2 , the example data interface 200 is configured to obtain themedia impression data 132 stored in the audience metrics database 120 ofFIG. 1 . For example, the data interface 200 may process a query requestby identifying and obtaining the media impression data 132 currentlystored in the audience metrics database 120. In other examples, the datainterface 200 may routinely (e.g., every hour, once a day, once a month,etc.), obtain the media impression data 132 stored in the audiencemetrics database 120. In such examples, the audience metrics analyzer124 can estimate (e.g., determine) the second frequency momentassociated with the data stored in the audience metrics database 120.

In some examples disclosed herein, prior to obtaining the mediaimpression data 132, the example data interface 200 determines whether aquery request is obtained. For example, the data interface 200determines whether a user (e.g., a user of any of the client devices 108intends to perform a query of the media impression data 132, whether themonitoring entity 102 intends to perform a query of the media impressiondata 132 (FIG. 1 ), etc.) has transmitted a query request. In the eventthe data interface 200 determines a query request is not obtained, thedata interface 200 waits. In other examples disclosed herein, datainterface 200 may initiate the generation of the second frequency momentin response to a threshold period of time regardless of a query request.

The example data interface 200 of the illustrated example of FIG. 2 isimplemented by a logic circuit such as, for example, a hardwareprocessor. However, any other type of circuitry may additionally oralternatively be used such as, for example, one or more analog ordigital circuit(s), logic circuits, programmable processor(s),application specific integrated circuit(s) (ASIC(s)), programmable logicdevice(s) (PLD(s)), field programmable logic device(s) (FPLD(s)),digital signal processor(s) (DSP(s)), etc. In examples disclosed herein,the data interface 200 may be referred to as example means for inputprocessing.

In the example illustrated in FIG. 2 , the comparator 202 determineswhether a precision threshold (e.g., T_(k)) associated with thedetermination of the second frequency moment has been determined. Forexample, the comparator 202 may obtain or have access to a previousprecision threshold. In this manner, the comparator 202 identifies alength of the vector based on the precision threshold. For example, ifthe precision threshold is determined to be high with respect to thenumber of total elements in the audience metrics database 122 (e.g., aprecision threshold of 50, 2,000, 10,000, etc.), the comparator 202 mayindicate a high k value (e.g., 50, 2,000, 10,000, etc.) to the hashinggenerator 204.

In addition, the comparator 202 determines whether the query (e.g., thequery request obtained by the data interface 200) can be completed basedon the query processing constraints (e.g., an amount of computingresources and/or memory resources available). Example query processingconstraints include time-based constraints and processing resourceconstraints. Example time-based constraints are scheduling constraints(e.g., conflicting high memory-utilization tasks, etc.). Exampleprocessing resource constraints are available memory utilization,available processing resources, etc. For example, in response to thequery request, the comparator 320 utilizes the second frequency momentlater generated by the second frequency moment generator 208 todetermine aspects of the media impression data 132 corresponding to therepeat rate, occurrence, etc. In this manner, the comparator 202 candetermine whether, based on the second frequency moment, enoughprocessing resources are available to execute the query request. In theevent the comparator 202 determines the query request cannot becompleted, the comparator 202 schedules the query for a further date andtime when the query request can be completed. For example, to execute aquery request, the comparator 202 may determine that 5 gigabytes of RAMare needed in order to fulfil the query request. In this example, thecomparator 202 determines whether 5 gigabytes of RAM are available and,thus, whether the query request can be completed. In other examplesdisclosed herein, the comparator 202 may determine whether enoughprocessing cores are available, whether the processor is over-utilized(e.g., 70% of processing resources are allocated to a different task),etc.

While the comparator 202 is described as determining whether the querycan be completed based on an amount of computing resources and/or memoryresources available, the comparator 202 may compare any other suitablemetric in determining whether the query can be completed. For example,the comparator 202 may estimate that the query will take four (4)minutes to complete. In such an example, the comparator 202 maydetermine whether there are future scheduled processing tasks that maycause over-utilization of the processing resources in the audiencemetrics analyzer 124 during execution of the query request. The examplecomparator 202 of the illustrated example of FIG. 2 is implemented by alogic circuit such as, for example, a hardware processor. However, anyother type of circuitry may additionally or alternatively be used suchas, for example, one or more analog or digital circuit(s), logiccircuits, programmable processor(s), application specific integratedcircuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)), fieldprogrammable logic device(s) (FPLD(s)), digital signal processor(s)(DSP(s)), etc. In examples disclosed herein, the comparator 202 may bereferred to as example means for managing.

In the example illustrated in FIG. 2 , the hashing generator 204 isconfigured to input each element obtained by the data interface 200through a hashing function. In examples disclosed herein, the hashinggenerator 204 identifies a precision threshold from the comparator 202and, thus, implements the hashing function based on the precisionthreshold. In this manner, the output from the hashing generator 204 isa bit-value representation of the input data. Such a bit-valuerepresentation of the input data is transmitted to the vector generator206. In examples disclosed herein, the comparator 202 identified whetherthere is/are additional data to be input to the hashing generator 204.The example hashing generator 204 of the illustrated example of FIG. 2is implemented by a logic circuit such as, for example, a hardwareprocessor. However, any other type of circuitry may additionally oralternatively be used such as, for example, one or more analog ordigital circuit(s), logic circuits, programmable processor(s),application specific integrated circuit(s) (ASIC(s)), programmable logicdevice(s) (PLD(s)), field programmable logic device(s) (FPLD(s)),digital signal processor(s) (DSP(s)), etc. In examples disclosed herein,the hashing generator 204 may be referred to as example means forgenerating.

The example vector generator 206 is configured to obtain the bit-valuerepresentation of the data in the audience metrics database 120 from thehashing generator 204. In this manner, the vector generator 206increments the kth position in the vector based on the output from thehashing generator 204. The example vector generator 206 iterates througheach element until all elements have been input through the hashinggenerator 204. In result, the vector generator 206 populates and, thus,generates an example vector of counts. In examples disclosed herein, thevector of counts is a single vector of length k (e.g., determined basedon the desired precision threshold), in which an element in the kthposition is incremented based on the output from the hashing generator204. The example vector generator 206 of the illustrated example of FIG.2 is implemented by a logic circuit such as, for example, a hardwareprocessor. However, any other type of circuitry may additionally oralternatively be used such as, for example, one or more analog ordigital circuit(s), logic circuits, programmable processor(s),application specific integrated circuit(s) (ASIC(s)), programmable logicdevice(s) (PLD(s)), field programmable logic device(s) (FPLD(s)),digital signal processor(s) (DSP(s)), etc. In examples disclosed herein,the vector generator 206 may be referred to as example means for vectorgenerating.

In the example illustrated in FIG. 2 , the second frequency momentgenerator 208 is configured to generate a mean-centered vector based onthe vector of counts generated by the vector generator 206. For example,the second frequency moment generator 208 may identify the mean of theelements in the vector of counts. In this manner, the second frequencymoment generator 208 determines the mean-centered vector by subtractingthe mean from each element in the vector of counts. The resultingvector, the mean-centered vector of counts, is then used by the secondfrequency moment generator 208 to estimate (e.g., determine) the secondfrequency moment. The example second frequency moment generator 208 ofthe illustrated example of FIG. 2 is implemented by a logic circuit suchas, for example, a hardware processor. However, any other type ofcircuitry may additionally or alternatively be used such as, forexample, one or more analog or digital circuit(s), logic circuits,programmable processor(s), application specific integrated circuit(s)(ASIC(s)), programmable logic device(s) (PLD(s)), field programmablelogic device(s) (FPLD(s)), digital signal processor(s) (DSP(s)), etc. Inexamples disclosed herein, the second frequency moment generator 208 maybe referred to as example means for identifying.

When determining the second frequency moment, the second frequencymoment generator 208 determines a sum of all elements, individuallysquared, within the mean-centered vector of counts. Additionally, thesecond frequency moment generator 208 determines a ratio of the numberof bins, k, divided with the number of bins, k, minus one (1). Anexample equation illustrative of instructions to be executed by thesecond frequency moment generator 208 to estimate (e.g., determine) thesecond frequency moment of the data in the audience metrics database 122is show below, in y78.

$\begin{matrix}{= {\frac{k}{k - 1}{\sum_{j = 1}^{k}x_{j}^{2}}}} & {{Equation}3}\end{matrix}$

In Equation 3, the variable F₂ corresponds to the second frequencymoment, the variable k corresponds to the number of bins determinedbased on the desired precision threshold, the variable x corresponds tothe mean-centered vector of counts (e.g., the mean-centered vector ofcounts generated by the second frequency moment generator 208), thevariable j is an iterative variable corresponding to a position in themean-centered vector of counts, x.

In the example illustrated in FIG. 2 , the variance generator 210 isconfigured to verify whether to determine the variance of the secondfrequency moment, F₂, determined by the second frequency momentgenerator 208. In some examples, the variance generator 210 may verifyto identify the variance of the second frequency moment, F₂, in theevent a user instruction indicating to identify the variance isreceived, in the event the audience metrics analyzer 124 determines toidentify the accuracy of the second frequency moment, F₂, determined bythe second frequency moment generator 208, etc. In the event thevariance generator 210 verifies to determine the variance of the secondfrequency moment, F₂, the variance generator 210 determines the varianceof the second frequency moment. In operation, the variance generator 210determines a ratio of (1) the second frequency moment, F₂, squared,times two, and (2) the number of bins, k, minus 1. An example equationillustrative of instructions to be executed by the variance generator210 to determine the variance of the second frequency moment of the datain the audience metrics database 122 is shown below, in Equation 4.

V 2 ) = 2 k - 1 Equation ⁢ 4

In Equation 4, the variable Var(F₂) corresponds to the variance of thesecond frequency moment and the variable k corresponds to the number ofbins determined based on the desired precision threshold.

In addition, Equation 5 below shows that the estimate of F₂ is anunbiased estimate. As such, the variance decreases as the length of thevector (e.g., k) increases. Having an unbiased estimator (e.g., as shownin Equation 5 below) is useful to substantially reduce or eliminate biasin estimates generated using examples disclosed herein.

E(

)=F ₂  Equation 5

The example variance generator 210 of the illustrated example of FIG. 2is implemented by a logic circuit such as, for example, a hardwareprocessor. However, any other type of circuitry may additionally oralternatively be used such as, for example, one or more analog ordigital circuit(s), logic circuits, programmable processor(s),application specific integrated circuit(s) (ASIC(s)), programmable logicdevice(s) (PLD(s)), field programmable logic device(s) (FPLD(s)),digital signal processor(s) (DSP(s)), etc. In examples disclosed herein,the variance generator 210 may be referred to as example means forvariance processing.

While an example manner of implementing the audience metrics analyzer124 of FIG. 1 is illustrated in FIG. 2 , one or more of the elements,processes and/or devices illustrated in FIG. 2 may be combined, divided,re-arranged, omitted, eliminated and/or implemented in any other way.Further, the example data interface 200, the example comparator 202, theexample hashing generator 204, the example vector generator 206, theexample second frequency moment generator 208, the example variancegenerator 210, and/or, more generally, the example audience metricsanalyzer 124 of FIG. 1 may be implemented by hardware, software,firmware and/or any combination of hardware, software and/or firmware.Thus, for example, any of the example data interface 200, the examplecomparator 202, the example hashing generator 204, the example vectorgenerator 206, the example second frequency moment generator 208, theexample variance generator 210, and/or, more generally, the exampleaudience metrics analyzer 124 of FIG. 1 could be implemented by one ormore analog or digital circuit(s), logic circuits, programmableprocessor(s), programmable controller(s), graphics processing unit(s)(GPU(s)), digital signal processor(s) (DSP(s)), application specificintegrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s))and/or field programmable logic device(s) (FPLD(s)). When reading any ofthe apparatus or system claims of this patent to cover a purely softwareand/or firmware implementation, at least one of the example datainterface 200, the example comparator 202, the example hashing generator204, the example vector generator 206, the example second frequencymoment generator 208, and/or the example variance generator 210 is/arehereby expressly defined to include a non-transitory computer readablestorage device or storage disk such as a memory, a digital versatiledisk (DVD), a compact disk (CD), a Blu-ray disk, etc. including thesoftware and/or firmware. Further still, the example audience metricsanalyzer 124 of FIG. 1 may include one or more elements, processesand/or devices in addition to, or instead of, those illustrated in FIG.2 , and/or may include more than one of any or all of the illustratedelements, processes and devices. As used herein, the phrase “incommunication,” including variations thereof, encompasses directcommunication and/or indirect communication through one or moreintermediary components, and does not require direct physical (e.g.,wired) communication and/or constant communication, but ratheradditionally includes selective communication at periodic intervals,scheduled intervals, aperiodic intervals, and/or one-time events.

Flowcharts representative of example hardware logic, machine readableinstructions, hardware implemented state machines, and/or anycombination thereof for implementing the audience metrics analyzer 124of FIG. 1 are shown in FIGS. 3 and/or 4 . The machine readableinstructions may be one or more executable programs or portion(s) of anexecutable program for execution by a computer processor such as theprocessor 512 shown in the example processor platform 500 discussedbelow in connection with FIG. 5 . The program may be embodied insoftware stored on a non-transitory computer readable storage mediumsuch as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, ora memory associated with the processor 512, but the entire programand/or parts thereof could alternatively be executed by a device otherthan the processor 512 and/or embodied in firmware or dedicatedhardware. Further, although the example program is described withreference to the flowcharts illustrated in FIGS. 3 and/or 4 , many othermethods of implementing the example audience metrics analyzer 124 mayalternatively be used. For example, the order of execution of the blocksmay be changed, and/or some of the blocks described may be changed,eliminated, or combined. Additionally or alternatively, any or all ofthe blocks may be implemented by one or more hardware circuits (e.g.,discrete and/or integrated analog and/or digital circuitry, an FPGA, anASIC, a comparator, an operational-amplifier (op-amp), a logic circuit,etc.) structured to perform the corresponding operation withoutexecuting software or firmware.

The machine readable instructions described herein may be stored in oneor more of a compressed format, an encrypted format, a fragmentedformat, a compiled format, an executable format, a packaged format, etc.Machine readable instructions as described herein may be stored as data(e.g., portions of instructions, code, representations of code, etc.)that may be utilized to create, manufacture, and/or produce machineexecutable instructions. For example, the machine readable instructionsmay be fragmented and stored on one or more storage devices and/orcomputing devices (e.g., servers). The machine readable instructions mayrequire one or more of installation, modification, adaptation, updating,combining, supplementing, configuring, decryption, decompression,unpacking, distribution, reassignment, compilation, etc. in order tomake them directly readable, interpretable, and/or executable by acomputing device and/or other machine. For example, the machine readableinstructions may be stored in multiple parts, which are individuallycompressed, encrypted, and stored on separate computing devices, whereinthe parts when decrypted, decompressed, and combined form a set ofexecutable instructions that implement a program such as that describedherein.

In another example, the machine readable instructions may be stored in astate in which they may be read by a computer, but require addition of alibrary (e.g., a dynamic link library (DLL)), a software development kit(SDK), an application programming interface (API), etc. in order toexecute the instructions on a particular computing device or otherdevice. In another example, the machine readable instructions may needto be configured (e.g., settings stored, data input, network addressesrecorded, etc.) before the machine readable instructions and/or thecorresponding program(s) can be executed in whole or in part. Thus, thedisclosed machine readable instructions and/or corresponding program(s)are intended to encompass such machine readable instructions and/orprogram(s) regardless of the particular format or state of the machinereadable instructions and/or program(s) when stored or otherwise at restor in transit.

The machine readable instructions described herein can be represented byany past, present, or future instruction language, scripting language,programming language, etc. For example, the machine readableinstructions may be represented using any of the following languages: C,C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language(HTML), Structured Query Language (SQL), Swift, etc.

As mentioned above, the example processes of FIGS. 3 and/or 4 may beimplemented using executable instructions (e.g., computer and/or machinereadable instructions) stored on a non-transitory computer and/ormachine readable medium such as a hard disk drive, a flash memory, aread-only memory, a compact disk, a digital versatile disk, a cache, arandom-access memory and/or any other storage device or storage disk inwhich information is stored for any duration (e.g., for extended timeperiods, permanently, for brief instances, for temporarily buffering,and/or for caching of the information). As used herein, the termnon-transitory computer readable medium is expressly defined to includeany type of computer readable storage device and/or storage disk and toexclude propagating signals and to exclude transmission media.

“Including” and “comprising” (and all forms and tenses thereof) are usedherein to be open ended terms. Thus, whenever a claim employs any formof “include” or “comprise” (e.g., comprises, includes, comprising,including, having, etc.) as a preamble or within a claim recitation ofany kind, it is to be understood that additional elements, terms, etc.may be present without falling outside the scope of the correspondingclaim or recitation. As used herein, when the phrase “at least” is usedas the transition term in, for example, a preamble of a claim, it isopen-ended in the same manner as the term “comprising” and “including”are open ended. The term “and/or” when used, for example, in a form suchas A, B, and/or C refers to any combination or subset of A, B, C such as(1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) Bwith C, and (7) A with B and with C. As used herein in the context ofdescribing structures, components, items, objects and/or things, thephrase “at least one of A and B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, and (3) atleast one A and at least one B. Similarly, as used herein in the contextof describing structures, components, items, objects and/or things, thephrase “at least one of A or B” is intended to refer to implementationsincluding any of (1) at least one A, (2) at least one B, and (3) atleast one A and at least one B. As used herein in the context ofdescribing the performance or execution of processes, instructions,actions, activities and/or steps, the phrase “at least one of A and B”is intended to refer to implementations including any of (1) at leastone A, (2) at least one B, and (3) at least one A and at least one B.Similarly, as used herein in the context of describing the performanceor execution of processes, instructions, actions, activities and/orsteps, the phrase “at least one of A or B” is intended to refer toimplementations including any of (1) at least one A, (2) at least one B,and (3) at least one A and at least one B.

As used herein, singular references (e.g., “a”, “an”, “first”, “second”,etc.) do not exclude a plurality. The term “a” or “an” entity, as usedherein, refers to one or more of that entity. The terms “a” (or “an”),“one or more”, and “at least one” can be used interchangeably herein.Furthermore, although individually listed, a plurality of means,elements or method actions may be implemented by, e.g., a single unit orprocessor. Additionally, although individual features may be included indifferent examples or claims, these may possibly be combined, and theinclusion in different examples or claims does not imply that acombination of features is not feasible and/or advantageous.

FIG. 3 is a flowchart representative of example machine-readableinstructions 300 which may be executed to implement the example audiencemetrics analyzer 124 of FIGS. 1 and/or 2 to estimate the secondfrequency moment. In FIG. 3 , the example data interface 200 (FIG. 2 )determines whether a query request is obtained. (Block 301). Forexample, the data interface 200 determines whether a query has beenreceived (e.g., from the monitoring entity 102) to perform a query ofthe media impression data 132 (FIG. 1 ). In the event the data interface200 determines a query request is not obtained (e.g., the control ofblock 301 returns a result of NO), the instructions wait. Alternatively,in the event the data interface 200 determines a query request isavailable (e.g., the control of block 301 returns a result of YES), theinstructions proceed to block 302. In other examples disclosed herein,the instructions may proceed to block 302 in response to a thresholdperiod of time indicating to execute the instructions 300 regardless ofa query request. For example, the data interface 200 may execute theinstructions illustrated in block 302 every week, every hour, after athreshold period of attempts to identify a query request, etc.

In FIG. 3 , the example data interface 200 of FIG. 2 obtains the mediaimpression data 132. (Block 302). For example, the data interface 200obtains media impression data 132 stored in the audience metricsdatabase 120 of FIG. 1 .

The example comparator 202 (FIG. 2 ) determines whether a targetprecision threshold (e.g., T_(k)) associated with the determination ofthe second frequency moment has been determined. (Block 304). Forexample, the comparator 202 may perform an initial check to determinewhether the generation of a precision threshold (e.g., T_(k)) is needed.As an example, in the event a database includes one million rows,depending on the amount of duplication, there can be two extremes (e.g.,all rows are distinct and unique, or there exists one million copies ofthe same item). Because the variance includes (k−1) in the denominator,larger k values result in lower variances. However, in examplesdisclosed herein, the value of k should not be set too small (e.g., ifk=1 the variance is undefined, and every item gets assigned to the samesingular position). In addition, the value of k should not be set toolarge (e.g., k=1,000,000) because it would be the same as actuallykeeping track of every item in memory and how many times they appear. Inthat case, no estimation is needed as it is an exact answer just bycounting the frequencies. Thus, the selection of k becomes an analysisin tradeoffs between higher k values for lower variance, and lower kvalues (compared to the size of the database) for improved use ofcomputer memory and improved computer processing speed. For example, ifthere are 10¹⁰ entries in a database, and all cannot be loaded intomemory, the entries have to be streamed, looking at one item at a time.In such instances, examples disclosed herein estimate the secondfrequency moment also by streaming as each item is allocated to one of kbins independent of when it may be in the actual database. Accordingly,the comparator 202 may determine a k value so that it is not too large(e.g., a k value equal to 1,000,000 results in inefficiencies inanalyzing the bins) and not too small (e.g., a k value equal to 1results in the same bin assigned to each input). In this manner, thevalue of k may be determined by the comparator 202 in response toanalyzing the trade-offs for either a higher k and lower variance, or alower k and better computer memory utilization and computer processingspeed. In the event the example comparator 202 determines a targetprecision threshold (e.g., T_(k)) has not been determined (e.g., thecontrol of block 304 returns a result of NO), the instructions wait. Insome examples disclosed herein, there may be a threshold period of timeafter which the comparator 202 assigns an arbitrary number of bins(e.g., a number of bins equal to one tenth of the total elements in theinput data, etc). In other examples disclosed herein, there may be athreshold number of attempts after which the comparator 202 assigns anarbitrary desired precision threshold (e.g., one tenth of the totalelements in the input data, etc).

In the event the example comparator 202 determines a target precisionthreshold has been determined (e.g., the control of block 304 returns aresult of YES), the hashing generator 204 (FIG. 2 ) inputs an elementobtained by the data interface 200 through a hashing function toidentify the kth position in the vector of counts. (Block 306). Forexample, the element is obtained from the audience metrics database 120and represents a portion of the media impression data 132 stored in theaudience metrics database 120. The element may be information from alogged impression record, a portion of a server log, etc. As discussedabove, an example hashing function is used by the hashing generator 204to generate a bit-value representation of the element(s) and/orotherwise portion(s) of the media impression data 132 stored in theaudience metrics database 120.

In response, the example vector generator 206 (FIG. 2 ) increments theelement in the kth position identified by the hashing generator 204.(Block 308). For example, the hashing generator 204 may output a hashvalue of 0.29 which represents the k=3 position (e.g., bin number 3 inthe vector). For example, based on the bit-value representation of theelement in the media impression data 132 obtained from the hashinggenerator 204, the vector generator 206 increments the corresponding kthposition of the vector of counts. In examples disclosed herein, thecomparator 202 determines whether there are additional elements toanalyze. (Block 310). For example, the comparator 202 may determinewhether there are additional portion(s) of the media impression data 132(e.g., additional logged impressions, additional server logs, etc.) tobe processed by the hashing generator 204. In the event the comparator202 determines there are additional elements to analyze (e.g., thecontrol of block 310 returns a result of YES), the instructions returnto block 306.

Alternatively, in the event the comparator 202 determines there are noadditional elements to analyze (e.g., the control of block 310 returns aresult of NO), the vector generator 206 generates the vector of counts.(Block 312). For example, the vector generator 206 may accumulate allelements previously generated into a singular matrix of length k (e.g.,the number of bins determined based on the desired precision threshold).In response to the execution of the instructions illustrated in block312, the second frequency moment generator 208 determines (e.g.,generates) a mean-centered vector based on the vector of countsgenerated by the vector generator 206. (Block 314). For example, thesecond frequency moment generator 208 may identify the mean of theelements in the vector of counts. In this manner, the second frequencymoment generator 208 determines the mean-centered vector by subtractingthe mean from each element in the vector of counts.

The resulting vector, the mean-centered vector of counts, is used by thesecond frequency moment generator 208 to estimate (e.g., determine) thesecond frequency moment. (Block 316). For example, the second frequencymoment generator 208 can estimate the second frequency moment using themean-centered vector. Additional description of the instructionsrepresented by block 316 is described below, in connection with FIG. 4 .

In the example illustrated in FIG. 3 , the variance generator 210verifies whether to determine the variance of the second frequencymoment, F₂, determined by the second frequency moment generator 208.(Block 318). In the event the example variance generator 210 verifies todetermine the variance of the second frequency moment (e.g., the controlof block 318 returns a result of YES), the variance generator 210determines the variance of the second frequency moment. (Block 320). Forexample, the variance generator 210 may execute instructionsrepresenting Equation 4 to determine the variance of the secondfrequency moment.

At block 322, the example comparator 202 determines whether the querycan be completed based on query processing constraints of the audiencemetrics analyzer 124. (Block 322). For example, in response to the queryrequest, the comparator 320 utilizes the second frequency moment todetermine aspects of the media impression data 132 corresponding to therepeat rate, occurrence, etc. In this manner, the example comparator 202can determine whether, based on the second frequency moment, enoughprocessing resources are available to execute the query request. In theevent the example comparator 202 determines the query request cannot becompleted (e.g., the control of block 322 returns a result of NO), thecomparator 202 schedules the query for a further date and time when thequery request can be completed. (Block 324). For example, the comparator202 may determine that the query request will take four minutes tocomplete. In the event tasks that require significant processingresources (e.g., migrating a memory that, when executed, may utilize 95%of the CPU's processing resources) are scheduled during the next fourminutes, the comparator 202 may determine to either delay the query fora future date and time, or delay the tasks that require significantprocessing resources. Alternatively, in the event the comparator 202determines that the query request can be completed (e.g., the control ofblock 322 returns a result of YES), the control proceeds to block 326.

In response to execution of instructions represented by block 324, or inresponse to the comparator 202 determining the query can be completedwithin the query processing constraint(s) (e.g., the control of block322 returns a result of YES), the comparator 202 determines whether toupdate the desired precision threshold. (Block 326). For example, thecomparator 202 may determine not to update the desired precisionthreshold in the event a more accurate second frequency moment isrequested, etc. In the event the example comparator 202 determines toupdate the desired precision threshold (e.g., the control of block 326returns a result of YES), the comparator 202 updates the targetprecision threshold. (Block 328). In such an event, the examplecomparator 202 communicates the updated target precision threshold tothe hashing generator 204.

Alternatively, in the event the example comparator 202 determines not toupdate the target precision threshold (e.g., the control of block 322returns a result of NO), or in response to the execution of theinstructions represented by block 328, the audience metrics analyzer 124determines whether to continue operating. (Block 330). In examplesdisclosed herein, the audience metrics analyzer 124 may determine tocontinue operating in response to an additional request to determine thesecond frequency moment, in response to additional data stored in theaudience metrics database 122, etc. Alternatively, in examples disclosedherein, the audience metrics analyzer 124 may determine not to continueoperating in response to a loss of power, etc.

In the event the example audience metrics analyzer 124 determines tocontinue operating (e.g., the control of block 330 returns a result ofYES), the control returns to block 301. Alternatively, in the event theexample audience metrics analyzer 124 determines not to continueoperating (e.g., the control of block 330 returns a result of NO), theinstructions stop.

FIG. 4 is a flowchart representative of example machine-readableinstructions 400 which may be executed to implement the example audiencemetrics analyzer 124 of FIGS. 1 and/or 2 to estimate the secondfrequency moment. At block 402, the example second frequency momentgenerator 208 squares the element in the jth position of themean-centered vector of counts. (Block 402). In response, the examplesecond frequency moment generator 208 adds the recently squared elementin the jth position to the sum. (Block 404). For example, the initialsum may be zero (0), and, thus, the second frequency moment generator208 indicates the squared element in the jth position as the sum.Accordingly, in subsequent iterations, the newly squared elements in thejth positions are added to the sum by the second frequency momentgenerator 208.

Thus, in response to the execution of the instructions represented byblock 404, the example second frequency moment generator 208 incrementsthe variable j. (Block 406). Additionally, the second frequency momentgenerator 208 determines whether there is another element to analyze.(Block 408). In the event the example second frequency moment generator208 determines there is another element to analyze (e.g., the control ofblock 408 returns a result of YES), the instructions return to block402.

Alternatively, in the event the example second frequency momentgenerator 208 determines there is not an additional element to analyze(e.g., the control of block 408 returns a result of NO), the secondfrequency moment generator 208 multiplies the sum with the number ofbins determined based on the target precision threshold ratio. (Block410). For example, the target precision threshold ratio is a ratio ofthe number of bins determined based on the target precision threshold,k, divided with the number of bins determined based on the targetprecision threshold, k, minus one (1). For example, the target precisionthreshold ratio may be determined by the second frequency momentgenerator 208 by executing instructions representing Equation 6, below.

$\begin{matrix}{{{Target}{Precision}{Threshold}{Ratio}} = \frac{k}{k - 1}} & {{Equation}6}\end{matrix}$

An example equation illustrative of instructions to be executed by thesecond frequency moment generator 208 to determine the second frequencymoment of the data in the audience metrics database 122 is shown above,in Equation 3.

In response to the execution of the instructions represented by block410, control returns to block 318 of FIG. 3 .

FIG. 5 is a block diagram of an example processor platform 500structured to execute the instructions of FIGS. 3 and/or 4 to implementthe audience metrics analyzer 124 of FIGS. 1 and/or 2 . The processorplatform 500 can be, for example, a server, a personal computer, aworkstation, a self-learning machine (e.g., a neural network), a mobiledevice (e.g., a cell phone, a smart phone, a tablet such as an iPad™), apersonal digital assistant (PDA), an Internet appliance, a DVD player, aCD player, a digital video recorder, a Blu-ray player, a gaming console,a personal video recorder, a set top box, a headset or other wearabledevice, or any other type of computing device.

The processor platform 500 of the illustrated example includes aprocessor 512. The processor 512 of the illustrated example is hardware.For example, the processor 512 can be implemented by one or moreintegrated circuits, logic circuits, microprocessors, GPUs, DSPs, orcontrollers from any desired family or manufacturer. The hardwareprocessor may be a semiconductor based (e.g., silicon based) device. Inthis example, the processor implements the example data interface 200,the example comparator 202, the example hashing generator 204, theexample vector generator 206, the example second frequency momentgenerator 208, and/or the example variance generator 210.

The processor 512 of the illustrated example includes a local memory 513(e.g., a cache). The processor 512 of the illustrated example is incommunication with a main memory including a volatile memory 514 and anon-volatile memory 516 via a bus 518. The volatile memory 514 may beimplemented by Synchronous Dynamic Random Access Memory (SDRAM), DynamicRandom Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory(RDRAM®) and/or any other type of random access memory device. Thenon-volatile memory 516 may be implemented by flash memory and/or anyother desired type of memory device. Access to the main memory 514, 516is controlled by a memory controller.

The processor platform 500 of the illustrated example also includes aninterface circuit 520. The interface circuit 520 may be implemented byany type of interface standard, such as an Ethernet interface, auniversal serial bus (USB), a Bluetooth® interface, a near fieldcommunication (NFC) interface, and/or a PCI express interface.

In the illustrated example, one or more input devices 522 are connectedto the interface circuit 520. The input device(s) 522 permit(s) a userto enter data and/or commands into the processor 512. The inputdevice(s) can be implemented by, for example, an audio sensor, amicrophone, a camera (still or video), a keyboard, a button, a mouse, atouchscreen, a track-pad, a trackball, isopoint and/or a voicerecognition system.

One or more output devices 524 are also connected to the interfacecircuit 520 of the illustrated example. The output devices 524 can beimplemented, for example, by display devices (e.g., a light emittingdiode (LED), an organic light emitting diode (OLED), a liquid crystaldisplay (LCD), a cathode ray tube display (CRT), an in-place switching(IPS) display, a touchscreen, etc.), a tactile output device, a printerand/or speaker. The interface circuit 520 of the illustrated example,thus, typically includes a graphics driver card, a graphics driver chipand/or a graphics driver processor.

The interface circuit 520 of the illustrated example also includes acommunication device such as a transmitter, a receiver, a transceiver, amodem, a residential gateway, a wireless access point, and/or a networkinterface to facilitate exchange of data with external machines (e.g.,computing devices of any kind) via a network 526. The communication canbe via, for example, an Ethernet connection, a digital subscriber line(DSL) connection, a telephone line connection, a coaxial cable system, asatellite system, a line-of-site wireless system, a cellular telephonesystem, etc.

The processor platform 500 of the illustrated example also includes oneor more mass storage devices 528 for storing software and/or data.Examples of such mass storage devices 528 include floppy disk drives,hard drive disks, compact disk drives, Blu-ray disk drives, redundantarray of independent disks (RAID) systems, and digital versatile disk(DVD) drives.

Machine executable instructions 532 represented in FIGS. 3 and/or 4 maybe stored in the mass storage device 528, in the volatile memory 514, inthe non-volatile memory 516, and/or on a removable non-transitorycomputer readable storage medium such as a CD or DVD.

From the foregoing, it will be appreciated that example methods,apparatus and articles of manufacture have been disclosed that estimatethe second frequency moment of data. Examples disclosed herein employmethods and apparatus to estimate the second frequency moment of datafrom a non-deduplicated databased associated with a single mediaprovider. Examples disclosed herein employ a vector of countsmethodology to estimate the second frequency moment. Disclosed methods,apparatus and articles of manufacture improve the efficiency of using acomputing device by estimating the second frequency moment to determinequery times and costs associated with a set of data. In applications inwhich there are millions of entries, such a query request may requiresignificate computer processing time, computer processing power,computer energy costs, etc. Accordingly, examples disclosed herein canefficiently estimate the second frequency moment to better estimate thecomputer processing time, computer processing power, computer energycost, etc., associated with servicing the query request. Examplesdisclosed herein can more efficiently allocate resources prior, during,and in future scheduling to service the query request. For example,examples disclosed herein can determine whether a query can be servicedwithin query processing constraints that may be selected to improve theoperation of a computer by not consuming so many computing resourcesthat other processes on the same computer would exhibit poor performanceor be unable to function. Examples disclosed herein may also be used toimprove operation of a computer by deferring servicing of queries to alater time when a query cannot be immediately serviced due to queryprocessing constraints not being satisfied by immediately availablecomputing resources and/or computer processing time. The disclosedmethods, apparatus and articles of manufacture are accordingly directedto one or more improvement(s) in the functioning of a computer.

Although certain example methods, apparatus and articles of manufacturehave been disclosed herein, the scope of coverage of this patent is notlimited thereto. On the contrary, this patent covers all methods,apparatus and articles of manufacture fairly falling within the scope ofthe claims of this patent.

Example methods, apparatus, systems, and articles of manufacture toestimate the second frequency moment for computer-monitored mediaaccesses are disclosed herein. Further examples and combinations thereofinclude the following:

Example 1 includes an apparatus to determine whether a computer cancomplete a query request associated with media impression data based ona second frequency moment of the media impression data, the apparatuscomprising a hashing generator to, in response to a query request, inputa first audience member identifier and a second audience memberidentifier of the media impression data through a hashing function, anoutput of the hashing generator being a first bit-value representationof the first audience member identifier and a second bit-valuerepresentation of the second audience member identifier, a vectorgenerator to, responsive to obtaining the output of the hashinggenerator, increment first and second values of corresponding first andsecond positions in a vector based on the first and second bit-valuerepresentations, respectively, a second frequency moment generator toestimate the second frequency moment of the media impression data usingthe vector, and a comparator to, using the second frequency moment,determine whether the computer can complete the query request of themedia impression data based on query processing constraints.

Example 2 includes the apparatus of example 1, wherein the comparator isto determine a precision threshold, the precision threshold to betransmitted to the hashing generator for use in determining a number ofpositions of the vector.

Example 3 includes the apparatus of example 1, wherein the secondfrequency moment generator is to estimate the second frequency moment bysquaring the first value stored in the first position, squaring thesecond value stored in the second position, summing the first and secondvalues, and multiplying the sum of the first and second values by aprecision threshold ratio.

Example 4 includes the apparatus of example 3, wherein the first andsecond values correspond to the first and second positions,respectively, in a mean-centered vector, the mean-centered vector beinggenerated based on the vector.

Example 5 includes the apparatus of example 1, wherein the secondfrequency moment generator is to determine a mean-centered vector basedon the vector, the second frequency moment generator to use themean-centered vector to estimate the second frequency moment.

Example 6 includes the apparatus of example 1, further including a datainterface to obtain the media impression data from an audience metricsdatabase, the media impression data corresponding to audience membertotals for media.

Example 7 includes the apparatus of example 6, wherein the mediaimpression data includes a plurality of audience member identifiers, thefirst audience member identifier of the plurality of audience memberidentifiers being a duplicate of the second audience member identifierof the plurality of audience member identifiers.

Example 8 includes the apparatus of example 1, wherein the queryprocessing constraints include at least one of a scheduling constraint,available memory utilization, or available processing resources.

Example 9 includes a non-transitory computer readable medium comprisingcomputer readable instructions which, when executed, cause a processorto at least in response to a query request, input a first audiencemember identifier and a second audience member identifier of mediaimpression data through a hashing function, an output being a firstbit-value representation of the first audience member identifier and asecond bit-value representation of the second audience memberidentifier, increment first and second values of corresponding first andsecond positions in a vector based on the first and second bit-valuerepresentations, respectively, estimate the second frequency moment ofthe media impression data using the vector, and using the secondfrequency moment, determine whether the processor can complete the queryrequest of the media impression data based on query processingconstraints.

Example 10 includes the computer readable medium of example 9, whereinthe instructions, when executed, cause the processor to determine aprecision threshold, the precision threshold to be used in determining anumber of positions of the vector.

Example 11 includes the computer readable medium of example 9, whereinthe instructions, when executed, cause the processor to estimate thesecond frequency moment by squaring the first value stored in the firstposition, squaring the second value stored in the second position,summing the first and second values, and multiplying the sum of thefirst and second values by a precision threshold ratio.

Example 12 includes the computer readable medium of example 11, whereinthe first and second values correspond to the first and secondpositions, respectively, in a mean-centered vector, the mean-centeredvector being generated based on the vector.

Example 13 includes the computer readable medium of example 9, whereinthe instructions, when executed, cause the processor to determine amean-centered vector based on the vector, and estimate the secondfrequency moment using the mean-centered vector.

Example 14 includes the computer readable medium of example 9, whereinthe instructions, when executed, cause the processor to obtain the mediaimpression data from an audience metrics database, the media impressiondata corresponding to audience member totals for media.

Example 15 includes the computer readable medium of example 14, whereinthe media impression data includes a plurality of audience memberidentifiers, the first audience member identifier of the plurality ofaudience member identifiers being a duplicate of the second audiencemember identifier of the plurality of audience member identifiers.

Example 16 includes the computer readable medium of example 9, whereinthe query processing constraints include at least one of a schedulingconstraint, available memory utilization, or available processingresources.

Example 17 includes a method to determine whether a computer cancomplete query request associated with media impression data based on asecond frequency moment of the media impression data, the methodcomprising in response to a query request, inputting a first audiencemember identifier and a second audience member identifier of mediaimpression data through a hashing function, an output being a firstbit-value representation of the first audience member identifier and asecond bit-value representation of the second audience memberidentifier, incrementing first and second values of corresponding firstand second positions in a vector based on the first and second bit-valuerepresentations, respectively, estimating the second frequency moment ofthe media impression data using the vector, and using the secondfrequency moment, determining whether the computer can complete thequery request of the media impression data based on query processingconstraints.

Example 18 includes the method of example 17, further includingdetermining a precision threshold, the precision threshold to be used indetermining a number of positions of the vector.

Example 19 includes the method of example 17, further includingestimating the second frequency moment by squaring the first valuestored in the first position, squaring the second value stored in thesecond position, summing the first and second values, and multiplyingthe sum of the first and second values by a precision threshold ratio.

Example 20 includes the method of example 19, wherein the first andsecond values correspond to the first and second positions,respectively, in a mean-centered vector, the mean-centered vector beinggenerated based on the vector.

The following claims are hereby incorporated into this DetailedDescription by this reference, with each claim standing on its own as aseparate embodiment of the present disclosure.

1. A system to determine a variance of a second frequency moment ofmedia impression data, the system comprising: at least one memory;programmable circuitry; and instructions to cause the programmablecircuitry to: increment a first value corresponding to a first positionin a vector based on a first bit value representation, the first bitvalue representation corresponding to a first audience memberidentifier; increment a second value corresponding to a second positionin the vector based on a second bit value representation, the second bitvalue representation corresponding to a second audience memberidentifier; estimate the second frequency moment of the media impressiondata using the vector; determine the variance of the second frequencymoment; and based on the second frequency moment, schedule a query ofthe media impression data to be processed by a computer at a futuretime.
 2. The system of claim 1, wherein the programmable circuitry is todetermine the variance of the second frequency moment by: determining afirst result by squaring the second frequency moment; doubling the firstresult; determining a second result by subtracting one from a number ofpositions in the vector; and determining a ratio of the first result andthe second result.
 3. The system of claim 2, wherein the number ofpositions in the vector is based on a target precision threshold.
 4. Thesystem of claim 3, wherein the programmable circuitry is to determinewhether to update the target precision threshold after determiningwhether the computer can complete the query of the media impression databased on a processing constraint.
 5. The system of claim 4, wherein,based on a request for a more accurate second frequency moment, theprogrammable circuitry is to determine not to update the targetprecision threshold.
 6. The system of claim 4, wherein the processingconstraint includes at least one of a scheduling constraint, availablememory utilization, or an available processing resource.
 7. The systemof claim 1, wherein the programmable circuitry is to obtain the mediaimpression data from an audience metrics database, the media impressiondata corresponding to an audience member total for a media item.
 8. Anon-transitory computer readable medium comprising computer readableinstructions which, when executed, cause processor circuitry to atleast: increment, based on a first bit value representation, a firstvalue corresponding to a first position in a vector, the first bit valuerepresentation corresponding to a first audience member; increment,based on a second bit value representation, a second value correspondingto a second position in the vector, the second bit value representationcorresponding to a second audience member; estimate a second frequencymoment of media impression data using the vector; determine a varianceof the second frequency moment; and based on the second frequencymoment, schedule a date and time for a computer to complete a queryrequest of the media impression data.
 9. The non-transitory computerreadable medium of claim 8, wherein the instructions are to cause theprocessor circuitry to determine the variance of the second frequencymoment by: determining a first result by squaring the second frequencymoment; doubling the first result; determining a second result bysubtracting one from a number of positions in the vector; anddetermining a ratio of the first result and the second result.
 10. Thenon-transitory computer readable medium of claim 9, wherein the numberof positions in the vector is based on a target precision threshold. 11.The non-transitory computer readable medium of claim 10, wherein theinstructions are to cause the processor circuitry to determine whetherto update the target precision threshold after determining whether thecomputer can complete the query request of the media impression databased on a processing constraint.
 12. The non-transitory computerreadable medium of claim 11, wherein, based on a request for a moreaccurate second frequency moment, the instructions are to cause theprocessor circuitry to determine not to update the target precisionthreshold.
 13. The non-transitory computer readable medium of claim 11,wherein the processing constraint includes at least one of a schedulingconstraint, available memory utilization, or an available processingresource.
 14. The non-transitory computer readable medium of claim 8,wherein the instructions are to cause the processor circuitry to obtainthe media impression data from an audience metrics database, the mediaimpression data corresponding to an audience member total for a mediaitem.
 15. A method comprising: incrementing, by executing an instructionwith a processor, a first value corresponding to a first position in avector, the incrementing of the first value based on a first bit valuerepresentation, the first bit value representation corresponding to afirst audience member identifier; incrementing, by executing aninstruction with the processor, a second value corresponding to a secondposition in the vector, the incrementing of the second value based on asecond bit value representation, the second bit value representationcorresponding to a second audience member; estimating, by executing aninstruction with the processor, a second frequency moment of mediaimpression data using the vector; determining, by executing aninstruction with the processor, a variance of the second frequencymoment; and based on the second frequency moment, executing a query ofthe media impression data with the processor at a scheduled time. 16.The method of claim 15, further including: determining a first result bysquaring the second frequency moment; doubling the first result;determining a second result by subtracting one from a number ofpositions in the vector; and determining a ratio of the first result andthe second result.
 17. The method of claim 16, wherein the number ofpositions in the vector is based on a target precision threshold. 18.The method of claim 17, further including determining whether to updatethe target precision threshold after determining whether the processorcan complete the query of the media impression data based on a queryprocessing constraint.
 19. The method of claim 18, further including,based on a request for a more accurate second frequency moment,determining not to update the target precision threshold.
 20. The methodof claim 18, wherein the query processing constraint includes at leastone of a scheduling constraint, available memory utilization, or anavailable processing resource.
 21. The method of claim 15, furtherincluding obtaining the media impression data from an audience metricsdatabase, the media impression data corresponding to an audience membertotal for a media item.